Infinite-Order Percolation and Giant Fluctuations in a Protein Interaction Network 
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We investigate a model protein interaction network whose links represent interactions between 
individual proteins. This network evolves by the functional duplication of proteins, supplemented 
by random link addition to account for mutations. When link addition is dominant, an infinite- 
order percolation transition arises as a function of the addition rate. In the opposite limit of high 
duplication rate, the network exhibits giant structural fluctuations in different realizations. For 
biologically-relevant growth rates, the node degree distribution has an algebraic tail with a peculiar 
rate dependence for the associated exponent. 
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Inter-protein interactions underlie the performance of 
vital biological functions. Organisms with sequenced 
genomes, such as the yeast S. cerevisiae [1], provide im- 
portant test beds for analyzing protein interaction net- 
works [2] . The number of interactions per protein of S. 
cerevisiae follows a power-law [3-5] , a feature common to 
many complex networks, such as the Internet, the world- 
wide web, and metabolic networks [6]. Similar behavior 
is exhibited by protein interaction networks of various 
bacteria [7]. Based on the observational data, simple 
proteome growth models have recently been formulated 
to account for the evolution of this interaction network 
[8-11], where proteins are viewed as the nodes of a graph 
and links connect functionally related proteins. 

In this work, we determine the structure of a mini- 
mal protein interaction network model that evolves by 
the biologically-inspired processes of protein duplication 
and subsequent mutation. That is, the functionality of 
a duplicate protein is similar, but not identical, to the 
original and can gradually evolve with time due to mu- 
tations [4]. Within a rate equation approach [12,13], we 
show that: (i) the system undergoes an infinite-order per- 
colation transition as a function of mutation rate, with 
a rate-dependent power-law cluster-size distribution ev- 
erywhere below the threshold, (ii) there are giant fluc- 
tuations in network structure and no self-averaging for 
large duplication rate, and (hi) the degree distribution 
has an algebraic tail with a peculiar rate-dependent ex- 
ponent when the duplication and mutation rates have 
biologically realistic values. Some aspects of this last re- 
sult were recently seen [10,11]. 

In the model, nodes are added sequentially and the new 
node duplicates a randomly chosen pre-existing "target" 
node, viz., the new node links to each of the neighbors 
of the target with probability 1 — 6; each new node also 
links to any previous node with probability fi/N , where 
N is the current total number of nodes (Fig. 1). Thus 
an arbitrary number of clusters can merge when a single 
node is introduced. As we now discuss, this unusual dy- 
namics appears to be responsible for the unconventional 
percolation properties of this network in the limit of zero 
duplication rate but finite mutation rate (5 = 0, /3 > 0). 
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FIG. 1. Growth steps of the protein interaction network: 
The new node duplicates 2 out of the 3 links between the 
target node (shaded) and its neighbors. Each successful du- 
plication occurs with probability 1 — S (solid lines). The new 
node also attaches to any other network node with probability 
j3/N (dotted lines). Thus 3 previously disconnected clusters 
are joined by the complete event. 

Let C S (N) be the expected number of clusters of size 
s > 1. This cluster size distribution obeys the rate equa- 
tion 
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where the sum is over all si > 1, . . . , s n > 1 such that 
Si + • • • + s n + 1 = s. The first term on the right-hand side 
of Eq. (1) accounts for the loss of C s due to the linking of 
a cluster of size s with the newly-introduced node. The 
gain term accounts for all merging processes of n initially 
separated clusters whose total size is s — 1 . 

Solving for the first few C S (N), we see that they are 
all proportional to N. Thus writing C S (N) — Nc s , and 
introducing the generating function g(z) = J] 
Eq. (1) becomes 
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where g' = dg/dz. To detect the percolation transition, 
we use the fact that g(0) = ^ sc s is the fraction of nodes 
within finite clusters. Thus the size of the infinite cluster 
(the giant component) is NG — N(l — g{0)). Suppose 
that we are in the non-percolating phase; this means that 
g(0) = 1. In this regime, the average cluster size equals 
(s) = J2 s2 °s = 5'(0). To determine g'(0), we substitute 
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the expansion g(z) = 1 + zg'(0) + . . . into Eq. (2) and 
take the z — > limit. This yields a quadratic equation 
for g'(0) with solution 
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This has a real solution only for (3 < 1/4, thus identifying 
the percolation threshold as [3 C — 1/4. For (3 > (3 C , we 
express g'(0) in terms of the size of the giant component 
by setting z — in Eq. (2) to give 
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When /3 — > /3 C , we use G — > to simplify Eq. (4) and 
find (s) -» (1 - (3 c )Pc 2 = 12 - On thc other hand , E q- (3) 
shows that (s) — > 4 when (3 — > /3 C from below. Thus the 
average size of the /£m£e clusters jumps discontinuously 
from 4 to 12 as (3 passes through (3 C = \. 

The cluster size distribution c s exhibits distinct behav- 
iors below, at, and above the percolation transition. For 
(3 < (3 C , the asymptotic behavior of c s can be read off 
from the behavior of the generating function as z — > 0. 
If c s has the power-law behavior 
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as s — > oo, 
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then the corresponding generating function g(z) has the 
following small- z expansion 



g{z) = l+g'{0)z + BT{2-T){-zy 



(6) 



The regular terms are needed to reproduce the known 
zeroth and first derivatives of the generating function, 
while the asymptotic behavior is controlled by the domi- 
nant singular term (— z) T ~ 2 . Higher-order regular terms 
are asymptotically irrelevant. Substituting this expan- 
sion into Eq. (2) we find that the dominant terms are of 
the order of (— z) T ~ 3 . Balancing all contributions of this 
order gives 
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Intriguingly, a power-law cluster size distribution with a 
non-universal exponent arises for all (3 < (3 C . In contrast 
to ordinary critical phenomena, the entire range (3 < [3 C 
is critical. 

The power-law tail implies that the size of the largest 
cluster s max grows as a power law of the system size. 
From the extreme statistics criterion X) s > s N c s = 1 
and the asymptotics of Eq. (5), wc find s max cx TV 1 ^ 1 " -1 ), 

or s max oc N2~V^~^. In contrast, for conventional per- 
colation below threshold, the largest cluster has size 
Smax oc In N, reflecting the exponential tail of the cluster 
size distribution [14]. 

At the transition, Eq. (7) gives r = 3. However, the 
naive asymptotics c s cx s~ 3 cannot be correct as it im- 
plies that g'(0) diverges. Similarly, we cannot expand 



the generating function as in Eq. (6) with t = 3, since 
the singular term T(— 1) x (—z) has an infinite prefac- 
tor. As in other situations where the order of a singular 
term coincides with a regular term, we anticipate a log- 
arithmic correction. Thus consider the modified expan- 
sion g(z) = 1 + Az + zu(z) + . . ., where u(z) vanishes 
slower than any power of z, as z — > 0. Substituting 
this into Eq. (2), setting (3 = (3 C , and equating singu- 
lar terms yields (& + u) zu 1 + u 2 = 0. Solving this dif- 
ferential equation asymptotically we obtain the leading 
behavior u w 8/ ln(— z); this indeed vanishes slower than 
any power of z for z — > 0. Substituting this form for u(z) 
in the modified expansion for g(z) and inverting yields 
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Thus exactly at the transition, the cluster size distribu- 
tion acquires a logarithmic correction. This result also 
implies that the size of the largest component scales as 
Woe JVVa/inJV. 

Above the percolation transition, both g(0) = 1 — G 
and <?'(0) (Eq. (4)) are finite, so that the expansion for 
g(z) has the form g(z) = l — G+g'(0) z+. . .. Substituting 
this into Eq. (2) one can show that: (i) the full expan- 
sion of g(z) is regular in z, and (ii) the generating func- 
tion diverges at z* — 1/s*. This latter fact implies that 
c s oc e~ s / s * as s — > oo. The location of the singularity 
is determined by the condition e z+ ^^ 3 ~^ = 1. This gives 
s* — > 16/G as (3 — > (3 C . Realistic protein interaction net- 
works are always above the percolation transition, e.g., 
for yeast the giant component includes 54% of all nodes 
and 68% of the links of the system [3]; thus a giant com- 
ponent always exists and the cluster-size distribution has 
an exponential tail. 

The size of the giant component G{(3) is obtained by 
solving Eq. (2) near z = 0. A lengthy analysis [15] shows 
that near the percolation threshold: 
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so that all derivatives of G(j3) vanish as (3 — ■> j3 c . Thus the 
transition is of infinite order. Similar behavior has been 
recently observed [16-18,13] for several growing network 
models where single nodes and links were introduced in- 
dependently. This generic growth mechanism seems to 
give rise to fundamentally new percolation phenomena. 

We now examine the complementary limit of no muta- 
tions {(3 = 0) and show that individual realizations of the 
evolution lead to widely differing results. Consider first 
the limit of deterministic duplication of 5 = where all 
the links of the duplicated protein are completed. There 
is still a stochastic element in this growth, as the node 
to be duplicated is chosen randomly. When 5 — 0, the 
rate equation approach [Eqs. (14)-(15) below] predicts 
that the degree distribution Nk (defined as the number 
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of nodes that are linked to k other nodes) is given by 
N k = 2(1 - 2/N) k ~ 1 . 

However, this "solution" does not correspond to the 
outcome of any single realization of the duplication pro- 
cess. To appreciate this, consider the simple and generic 
initial state of two nodes that are joined by a single link. 
We denote this graph as Ki \, following the graph theo- 
retic terminology [19] that K n ^ m denotes a complete bi- 
partite graph in which every node in the subgraph of 
size n is linked to every node in the subgraph of size 
m. Duplicating one of the nodes in gives K2,i or 
K1.2, equiprobably. By continuing to duplicate nodes, 
one finds that at every stage the network always remains 
a complete bipartite graph, say Kk,N-k, and that every 
value of k = 1, . . . , N — 1 occurs with equal probability 
(Fig. 2). Thus the degree distribution remains singular - 
it is always the sum of two delta functions! 
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FIG. 2. Evolution of the complete bipartite graph K m , n 
after one deterministic duplication event. Only the links em- 
anating from the top nodes of each component are shown. 



For fixed AT, we average over all realizations of the evo- 
lution to obtain the average degree distribution 
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Computing (N k ) for other generic initial conditions, e.g., 
complete m-partite graphs and ring graphs [15], we find 
that the initial condition dependence persists throughout 
the evolution. More importantly, self-averaging breaks 
down: different realizations of the growth lead to statisti- 
cally distinguishable networks. Similar giant fluctuations 
arise in the general case of imperfect duplication where 
(3 = and 5 > [15]. To illustrate the origin of these 
macroscopic fluctuations, consider the network growth in 
the limit 5 <C 1. The probability that the first few dupli- 
cation steps are complete (all eligible links are created) 
is close to one. For this initial development, the degrees 
of each node increase and the probability to create iso- 
lated nodes becomes very small as the network grows. 
On the other hand, if the first duplication event was to- 
tally incomplete, an isolated node would be created. The 
creation of isolated nodes necessarily leads to more iso- 
lated nodes but subsequent duplication events. Thus the 
number of isolated nodes is a non-self-averaging quantity. 



In a similar fashion, the number of nodes of degree k for 
any finite k > is also non-self-averaging. 

Finally, we investigate to the evolution of the network 
when both incomplete duplication and mutation occur 
(5 < 1 and j3 > 0). Let us first determine the average 
node degree of the network, T>, for such general rates. 
In each growth step, the average number of links L in- 
creases by /3 + (1 - S)V. Therefore, L = [/3 + (1 - 5)V]N . 
Combining this with V = 2L/N gives [9,10] 
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a result that applies only when 5 > 5 C = 1/2. Below this 
threshold, the number of links grows as 

§-0 + V.l-l)±. (12) 
and combining with V(N) = 2L(N)/N, we find 

(-finite 5>l/2, 
V(N) = \ (31nN 5 = 1/2, (13) 

I const, x N 1 - 25 6 <l/2. 

Without mutation (/3 = 0) the average node degree al- 
ways scales as N 1 ^ 28 , so that a realistic finite average 
degree is recovered only when 5 = 1/2. Thus mutations 
play a constructive role, as a finite average degree arises 
for any duplication rate 5 > 1/2. 

We now consider this case of 5 > 1/2 and (3 > and 
apply the rate equation approach [12,13] to study the 
degree distribution N k (N). The degree k of a node in- 
creases by one at a rate A k = [X — S)k+f3. The first term 
arises because of the contribution from duplication, while 
mutation leads to the fc-independent contribution. The 
rate equations for the degree distribution are therefore 
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The first two terms account for processes in which the 
node degree increases by one. The source term G k de- 
scribes the introduction of a new node of k links, with 
a of these links created by duplication and b = k — a 
created by mutation. The probability of the former is 
E S > Q ™s(a)(l " S) a 8 s ~ a , where n s = N s /N is the prob- 
ability that a node of degree s is chosen for duplication, 
while the probability of the latter is (3 h e~' 3 /6!. Since du- 
plication and random attachment are independent pro- 
cesses, the source term is 
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From Eq. (14), the N k grow linearly with N. Substi- 
tuting N k (N) = N n k in the rate equations yields 
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Since Gk depends on n s for all s > k, the above equation 
is not a recursion. However, for large k, we can reduce 
it to a recursion by simple approximations. As k — > oo, 
the main contribution to the sum in Eq. (15) arises when 
b is small, so that a is close to fc, and the summand is 
sharply peaked around s ~ fc/(l — 5). This simplifies the 
sum, as we may replace the lower limit by s = k, and n s 
by its value at s = k/(l — S). Further, if nu decays as 
fc -7 , we write n s = (1 — £) 7 rifc and simplify Gk to 

s=k ^ ' b=0 

= (l-Sy- 1 n k , (17) 
since the former binomial sum equals (1 — 8) . 
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FIG. 3. Degree distribution nt versus k for the protein in- 
teraction network with 8 = 0.53 and /3 = 0.06. Shown is 
the distribution for N = 10 3 , 10 4 , and 10 e (bottom to top), 
with 10 4 , 10 3 , and 20 realizations respectively. A straight line 
(dotted) of the predicted slope of —2.37 is shown for visual 
reference. The inset shows the degree distribution exponent 
7 as a function of 5 from the numerical solution of Eq. (18). 

Thus for k — > oo, Eq. (16) reduces to a recursion re- 
lation, from which we deduce that nt has the power-law 
behavior ~ fc~ 7 , with 7 determined from the relation 

7(<5) = l + T ^-(l-5) 7 - 2 . (18) 

Notice that the replacement of n s by (1 — S) 1 ^ is valid 
only asymptotically. This explains the slow convergence 
of the degree distribution to the predicted power law 
form (Fig. 3). Intriguingly, the exponent 7(5) is in- 
dependent of the mutation rate (3 [20]. Nevertheless, 
the presence of mutations ((3 > 0) is vital to suppress 
the non-self-averaging as the network evolves and thus 
make possible a smooth degree distribution. If we adopt 
S = 0.53, as suggested by observations [4], we obtain 
7 = 2.373 . . ., compared to the numerical simulation re- 
sult of 7= 2.5 ±0.1 [10]. 



In summary, network growth by duplication and mu- 
tation leads to rich behavior with an infinite-order per- 
colation transition and no self-averaging in the absence 
of mutations. Without mutation, different realizations 
of the network lead to drastically different outcomes and 
each outcome is itself singular. Mutations are needed to 
form networks that are statistically similar to observed 
protein interaction networks. Thus mutations seem to 
play a constructive role in forming robust networks whose 
functioning realizes the primary purpose of mutations. 
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